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From the underlying Master equations we derive one-dimensional stochastic processes that de- 
scribe generalized ensemble simulations as well as tempering (simulated and parallel) simulations. 
The representations obtained are either in the form of a one-dimensional Fokker-Planck equation 
or a hopping process on a one- dimensional chain. In particular, we discuss the conditions under 
which these representations are valid approximate Markovian descriptions of the random walk in 
order parameter or control parameter space. They allow a unified discussion of the stationary dis- 
tribution on, as well as of the stationary flow across each space. We demonstrate that optimizing 
the flow is equivalent to minimizing the first passage time for crossing the space, and discuss the 
consequences of our results for optimizing simulations. Finally, we point out the limitations of these 
representations under conditions of broken ergodicity. 



I. INTRODUCTION 

The effective simulation of complex thermal systems 
like proteins and glasses is a constant challenge in con- 
temporary computational physics. Markov chain Monte 
Carlo simulation techniques for these systems have un- 
dergone remarkable advances in the last decades. Two 
main classes that have evolved are generalized ensemble 
and parallel tempering methods. 

In the generalized ensemble (GE) approach [l| the goal 
is to sample the state space of a physical system so that 
particularly rare but important states, e.g. low energy 
or barrier states, are encountered frequently. A variety 
of weight functions have been tested, as well as methods 
for iteratively improving these functions [2| . 

A persistent problem of such simulations is that re- 
laxation is slow due to barriers and bottlenecks, and 
for a long time it was not clear whether and how they 
can be controlled by using particular weight functions 
on the usual order parameter spaces. Parallel tempering 
(PT) - sometimes also called replica exchange method - 
promised a way out of this dilemma [3|, |J, [5| . Here sim- 
ulations are performed in parallel at different values of a 
control parameter, most often the temperature. At cer- 
tain times the current conformations of replicas at neigh- 
boring control parameter values are exchanged according 
to a generalized Metropolis rule. Thereby an individual 
replica could perform an additional random walk in con- 
trol parameter space and - due to shorter relaxation times 
in some control parameter regime - explore state space 
more evenly. 

However, the problem of slow relaxation arises also this 
time in the form of a slow and possibly uneven random 
walk through control parameter space. At least part of 
this problem is related to finding an efficient discretiza- 
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tion of control parameter space. 

Increasing the flow through order parameter space in 
GE sampling as well as through control parameter space 
in PT was always an incentive. However, usually it was 
discussed only informally, and only recently Trebst et al. 
[H 0, B Q have made an attempt to look at that prob- 
lem systematically. Instead of concentrating on the sta- 
tionary distributions that arise from the sampling, they 
concentrated on the stationary flow across order parame- 
ter and control parameter space. In order to optimize the 
flow they derived weight functions and control parameter 
discretization schemes. 

In this contribution we want to give that approach a 
more fundamental basis. We will first concentrate on the 
underlying Master equations describing GE and PT sim- 
ulations. From them we will derive in a systematic way 
the one-dimensional stochastic equations that form the 
basis for a flow analysis in order parameter and control 
parameter space. These equations will also allow us to in- 
vestigate connections between flow analysis and another 
concept describing the dynamics of stochastic processes, 
the first passage time (FPT). The one-dimensional rep- 
resentations are valid approximations for the underlying 
simulations only under certain conditions. If they are vi- 
olated, optimization schemes may still fail. For parallel 
tempering we will find a criterion from which the validity 
can be determined. 

We will focus in the next section on generalized en- 
semble sampling, while parallel tempering is the focus of 
the third section. We will close with a discussion of the 
effects of broken ergodicity on our results and an outlook. 



II. GENERALIZED ENSEMBLE SAMPLING 

Markov chain Monte Carlo simulations utilize a certain 
move set in combination with an acceptance probability, 
most often of Metropolis form to impose stochastic 
dynamics on a physical system. A move from state s to 
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s' is accepted with the probability 

Pm(s -> s') = min [1, w(s')/w(s)] . (1) 

The original choice for the weight function w(s) is the 
thermal or Boltzmann weight 

w(s) cx exp{-0E(s)] , (2) 

resulting in canonical sampling at inverse temperature 
(3 = l/k E T. However, other choices are possible as 
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well, depending on which particular aspect of state space 
should be emphasized. We will assume in the following 
that the weight function is based only on the energy E(s) 
of the state s, and we will use w(s) = w [E(s)] = w(E) 
interchangeably. 

In order to get a deeper understanding of simulations 
based on Eq. (fT]) we should look at their description via a 
Master equation in state space. P(s, t) is the probability 
to be in state s at time t, and its evolution in discrete 
computer time is given by 



P(s,t + 1) 



E 



P(s',t)W s (s' ->a)+P(a,t) 



i - E w & 

s' 



where the sums are over all possible states. The transition probabilities W(s — > s') in this equation are 



W s {s 



ip(s 



Pm(s -> s') 



(3) 



(4) 



where ip(s — > s') is the characteristic function of the move set with iji{s — * s') = 1 if the move from s to s' is allowed 
and zero otherwise. Provided the move set is balanced and ergodic, the stationary distribution reached by this 
Markov chain is Po(s) cx w(s) because of detailed balance. Note however that in addition to ip(s — ► s') — ip(s' — > s), 
for (s, s') with ip(s — > s') = 1 the property J^ s „ ip(s — > s") = J2 S " ^( s ' ~ ¥ s ") mus t be fulfilled. Particularly the latter 
property, i.e. every state must connect to the same number of other states, is sometimes overlooked. 

Equation (|3|) is an exact description of the simulation pro cess and it is also the basis for more thorough investigations 
in the mathematics of Metropolis simulations (Til . Il2l. Il3l. H^ |. However from a physicist's point of view a reduced 
description in terms of slow order parameters is of more interest. Prominent among the order parameters chosen is 
the energy itself. Using adiabatic elimination of fast degrees of freedom, see the Appendix \K\ an approximate Master 
equation in energy space can be formulated 



P(E,t+l) 



P(E',t)W E (E' 

E'^E 



E)+P(E,t) 
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E'^E 



E') 



(5) 



The effective transition probabilities We(E — > E') can be derived from Eqs. ([3]and and are given in that appendix, 
too. Note however that, in contrast to Eq. ([3]), Eq. (O is an approximation, valid only if all other degrees of freedom 
relaxate much faster than the energy. Nevertheless, even if relaxation orthogonal to the energy is slow, Eq. (O can 
still be viewed as a Markovian approximation to the fully non-Markovian process. Due to the degeneracy of states 
with energy, the stationary distribution of Eq. ^ is now 



P (E) cx n(E)w(E) 



(6) 



Here, n(E) is the density of states and we have assumed that the weight function for the Metropolis algorithm is 
based on energies only, as mentioned above in the discussion of Eq. (JTJ) . 

Coarse graining time and energy leads to a form of the Master equation that is continuous in both variables 



§i P (E,t) = J P(E',t)R E (E' -» E) - P(E,t) J R E (E - E') 



(7) 



where the transition probabilities have been replaced by 
rates Re{E — > E'). Note that for various systems state 
space and energy E could have been continuous from the 
start, i.e. the sums in Eqs. ([3|) and ([5|) could have al- 



ready been integrals. The continuous form of the Master 
equation in energy space is now the starting point for our 
final approximation. If the transition rates R E {E — ► E') 
are strongly peaked around E' sa E, a second order par- 
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tial differential equation can be derived from Eq. J7J). 
by various techniques, e.g. Kramers-Moyal expansion 
[3 EE El- This Fokker-Planck equation 16] for P{E, t) 
is given by 



d_ 

dE 



HE) 



P(E,t) 



(8) 



and we have written it already in a form that sepa- 
rates static and dynamic properties. D{E) is the energy- 
dependent diffusion coefficient that describes the local 
mobility and F(E) is the drift term which in one dimen- 
sion can always be derived from a potential, F(E) = 
— (d / dE)U (E) . In particular the stationary distribution 
of Eq. ([5]) is fully determined by this potential only, 



P {E) ocexp [-U(E)] 



(9) 



It is important to emphasize again that the transi- 
tion from the Master equation ([7]) to the Fokker-Planck 
equation ([8]) is possible only if the the transition rates 
Re{E — > £"), i.e. the underlying transition probabilities 
We(E — >• E'), are sufficiently local in the energy. Only 
in that limit the Fokker-Planck equation is an effective 
description of the more general Eq. ([7]). Nevertheless, 
even if there are non-local contributions to the transition 
rates, Eq. ([5]) can be viewed as the best local approxima- 
tion to Eq. ©. 

Eq. ([H]) can be written in a more compact form using 
the stationary distribution Pq(E), 



d_ 
di 



P(E,t) 



^D { E W E)^P 0{ Ey 



P(E,t). 



(10) 

In this form the fact that Pq{E) is the stationary dis- 
tribution can be seen immediately from the vanishing of 
the rightmost derivative on the rhs if P(E, t) is replaced 
by Po(E). This equation will be the basis for our further 
analysis of distribution and flow in energy space. 

The stationary distribution in energy space, Pq(E), 
is actually the histogram H(E) of the energy distribu- 
tion that is observed in an actual simulation. It is still 
given by Eq. ©, i.e. by the Metropolis weight func- 
tion w(E) multiplied with the density of states n(E). 
By an appropriate choice of the weight function any his- 
togram can be produced in the simulation. The usual 
choice of Boltzmann weights leads to the canonical dis- 
tribution, H(E) oc n(E) exp(— (3E), while the choice 
w(E) oc l/n(E) leads to a flat histogram H{E) = const 
0, EH . A flat histogram would be most appropriate to 
describe the properties of the system in question over 
a wide temperature range with equal accuracy, provided 
the Monte Carlo error at each energy is the same. Various 
methods have been discussed to actually obtain approx- 
imations to n{E) by iteratively improving simulations 
@,Ell- However, all of them are still plagued by the prob- 
lem that equilibration in the system can be slow, in par- 
ticular when a wide energy range is considered [20j, \2l\ . 
Moreover, it turned out that - even if a flat histogram is 
reached - the error distribution is not flat at all 6, 221. 



It was the important new step by Trebst et al. @ to 
look systematically at the flow in energy space in simu- 
lations. Instead of monitoring the histogram H(E), cor- 
responding to the stationary distribution, they added a 
label to the system and changed its value whenever it 
reached minimal and maximal values in the energy, i.e. 
E m i n and E max . By counting just these labels at each 
energy E, the distributions of systems moving up and 
down in energy, denoted by n up (E) and nd own (E), re- 
spectively, can be monitored. The original histogram is 
recovered from H(E) = n up (E) + nd OW n{E). However, 
in this way it is also possible to measure the fraction of 
systems moving up, 



fup(E) — 



'up 



(E) 



Tl U p 

(E) 
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and, correspondingly, that of systems moving down in en- 
ergy, fdown(E). Note that f up (E) + fdown(E) = 1. Both 
distributions actually are the stationary distributions of 
probability flow in the systems with boundary condi- 
tions /up(£mm) — 1, fup{E max ^ — and fdowni,E m i n ^ — 

0, fdown(E m ax) = 1, respectively. 

This flow in energy space can now be analyzed using 
the above Fokker Planck equations. Equations. ([HI [TUJ) 
are actually continuity equations for the probability flow, 



(12) 



with J(E,t) the probability current. The current be- 
tween E mm and E max < 
stationary solution of (fTJ 



tween E min and E max can now be determined as the 



J = 



D(E)P (E)-^P (E)- 1 



Pj(E) = const , (13) 



with Pj(E) being the stationary distribution for the flow 
under the above boundary conditions. Note that the sta- 
tionary distribution Pq(E) discussed before is actually 
the solution of Eq. (TTJ]) for zero flow! Integrating that 
equation we obtain 



Pj(E) Pj(E mm ) 



P (E) P {E min ) 

J / dE'— ; 

J Emin D(E')P (E>) 



(14) 



Using any of the above boundary conditions the total 
flow across energy space is then given by 

\J\ = ([D(E)Po(E)}- 1 Y 1 (15) 



where we used the notation < . >= f E max dE. 

In order to optimize the weight function w{E) used for 
the simulation to reach maximal flow across energy space, 
Trebst et al. maximized Eq. (|15[) under the constraint of 
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keeping the distribution Po(E) normalized, which is done 
by adding a Lagrange multiplier 



SPo(E) 



[D(E)P (E)Y 



X(P (E)} 



(16) 



Stationary flow is not the only concept that can be 
used to investigate the stochastic dynamics in a system. 
Another often used concept is the mean first passage time 
(FPT) 23]. It is the average time a particle starting at 
one end of a diffusion space needs to reach the other 
end for the first time. The total mean first passage time 
for crossing energy space from E m i n to E max in both 
directions, 

7" T^Emin y E raax ^j -\- T{E raax ► -^mm) (-^) 

can be derived from Eqs (|8I1 0[) and is given by [24| 

- = Cl dE mkmJL dE ' p ° (E ' ,+ 



= ([D(E)P (E)}- 1 ) (P (E)) 



(18) 



Note that Po(E) does not have to be normalized here. 
Minimization of r with respect to Pq{E) leads to 



SPo(E) 



[D(E)P (E)]- 1 )(Po(E)) 



= 



(19) 



Both variational equations, (|T6|) and (|19p. lead to the 
same solution 



Po,opt oc 



1 



VD{E) 



(20) 



which is already derived in [6[ from Eq. (fTT))) . This 
current-optimized stationary distribution leads to a sym- 
metric form of the Fokker Planck equation, 



d_ 
di 



P(E,t) 



d 



d 



P(E,t). (21) 



In order to reach such a current optimized histogram, 
H op t(E), in an actual simulation, the weight function has 
to be chosen accordingly. For this purpose it is necessary 
to obtain the local diffusion coefficient from a simulation 
using some initial weight function w(E). Differentiating 
Eq. |U]), D(E) is obtained from 



D(E) 



Po{E) 



d Pj{E) 



'dEP (E) 
where the we used the fact that 

Pj(E) 



[H(E)f'(E)}- 1 (22) 



f(E) 



Po(E) 



holds. Both, fup(E) as well as fdown(E) can be used for 
that purpose. However, some smoothening may have to 
be performed to obtain a smooth derivative, as discussed 
in [6j. Since Po l0 pt(E) — n(E)w op t(E), the optimized 
weight function is given by 



Wopt(E) 



1 



n(E)y/D(E) 



(24) 



Using the fact that n(E) is obtained from the actual sim- 
ulation by n(E) — H(E)/w(E), we arrive at the iteration 
formula 



Wopt(E) =w(E)y 



' f'(E) 
H(E) 



At the fixed point of this iteration f'(E) 
to H(E) = 1/ 



(25) 



H(E), leading 



D(E) as required, see Eq. 
In an actual simulation situation one iteration might 
not suffice. This is discussed in detail in @, @, @| • In the 
following we will attempt to apply the same approach to 
tempering simulations. 



III. PARALLEL TEMPERING 

Generalized ensemble sampling can be extended by 
adding movement in control parameter space. In addi- 
tion to moves among different states of the system at a 
particular value of the control parameter, moves along 
one or more control parameter directions are possible. 
The motivation behind such an extension is that relax- 
ation in some control parameter regime may be faster, 
thereby facilitating relaxation in the whole state+ control 
parameter space [3j. 

Usually, the motion along control parameter directions 
is not made continuous. Instead, a list of monotonically 
increasing or decreasing values f3 n is chosen, thereby in- 
troducing control parameter hopping. The most com- 
monly used control parameter is temperature, although 
other choices are possible, too (2f|. Simulations at a par- 
ticular control parameter value are performed using the 
Metropolis criterion fl} with a weight function w(f3,s), 
as before. Although Boltzmann weights are usually cho- 
sen, leading to canonical simulations at each parameter 
value, in principle any weight function is possible. 

Parallel tempering is the parallellized extension of a 
method called simulated tempering which we will sketch 
only briefly. In simulated tempering (26l . [27j . a single 
instance of the system is simulated only. After certain 
times, an attempt is made to change the control parame- 
ter value. In order to ensure that the stationary distribu- 
tion at each control parameter value is given by w(/3, s), 
the Metropolis criterion, 



(23) PM [(p, s )^((3',s)]=mm(l,^^l) , (26) 
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is used for the acceptance of a control parameter change. 
In order to ensure appreciable exchange between differ- 
ent control parameter values using Boltzmann weights 
@, the weight function has to be adjusted by a control 
parameter dependent function g((3), 

w(p,s)=e X p[-pE(s)+g(P)] . (27) 

This function <?(/?) actually determines the distribution 
of the single system among the control parameter values 
and - up to an additive constant - the optimal choice is 
the free energy g((3) = (3F((3) of the system analyzed [27l |. 
However, the problem of determining g([3) has hampered 
this approach. 

This problem is solved in parallel tempering. Here, 
copies of the system are simulated in parallel at each of 
the various control parameter values. At certain times 
exchanges of replicas with neighboring control parameter 
values are attempted. In order to ensure that the station- 
ary distribution at a control parameter value (3 is given 
by w(/3,s), the generalized Metropolis criterion 

ua \ n > 'm • (a w(/3, s')w(P',s) \ 
PM ms)^0,s)]=mm{l, w{Mwif3%sl) ) (28) 

has to be used for the acceptance of such an attempt. It 
can be seen easily that any function g(f3) in the weight 
function (|27|) . in particular the one that was necessary 
to ensure equilibration among control parameter values 



in simulated tempering, simply drops out in the parallel 
form. Thereby, the problem of determining the free en- 
ergy in order to optimize the simulation vanishes. In the 
case of Boltzmann weights (|2"B"|) reduces to 

p M [(P., s) -» (/?', s')\ = min [1, exp(A/?A£] (29) 

with A/3 = /3'-l3 and AE = E' — E. 

We will concentrate on temperature hopping in the 
following and choose the list of inverse temperatures 
(3o > Pi > ... > (3m- In a parallel implementation simula- 
tions at a particular value (3 n are usually run on a partic- 
ular node of the parallel computer, conveniently labeled 
n. In order to simplify the notation, we will therefore ab- 
breviate (3 n by n whenever possible and also use "node" 
synonymously with " control parameter value" . 

It is sufficient to follow only a single replica through 
state and control parameter space, since all replicas are 
equivalent. For times between replica exchanges, simula- 
tions are performed at each node independently, and the 
time evolution of the distribution function P p„, s),t] 
at a particular node n is described by the Master equa- 
tion (|3|) with the respective transition probabilities de- 
termined by the appropriate temperature (3 n . For times 
t = mT, m = 1,2, replica exchange is attempted. For 
this time step the Master equation in state and temper- 
ature space is 



I 

P [(/?„, S ),t + 1] = i P [Wn-l,s'),t] W s [(&,-!,*') -> (/?„, S)] + P [(/3 n+1 , S '), t] W s [(& +1 , s') (f3 n , S )]} + 

s' 

P[(P n ,s),t]^{l-W s [(P n ,s)^((3 n ^,s')]-W s [((3 n ,s)^(/3 n+1 ,s')}} . (30) 

s' 

The transition probabilities are given by 

W s [(f3,s)^(P',s')} = -^p M [((3,s)^(0',s')} for s ? s' . (31) 

Here, £1 is the normalization by state space and reflects the fact that any conformations can be exchanged, which is 
different from the case of GE where the move set was restricting possible conformation changes, see Eq. ([!]). N takes 
into account that just for one random neighboring pair of nodes a replica exchange is attempted at time t = mT. 
However, other strategies are possible, too, that would lead to a different normalization constant. Due to the exchange 
of replicas in parallel tempering the transition probabilities are symmetric, i.e. 

W s [(/3, s) - (/?', s')} = W s [(/?', s') - (/?, *)] . (32) 

Elimination of fast degrees of freedom orthogonal to the energy is possible in the same way as it was discussed in the 
last section. This leaves us with 

P p n) E),t+1] = J2i p [(Pn-i,E'),t\ W E [((3 n -i,E') -> (J3 n , E)] + P [(/3 n+1 ,E'),t] W E [((3 n +i,E') -> (f3 n ,E)}} + 

E' 

P [(/?„, E) , t] {1 - We Pn, E) , E 1 )] - We Pn, E) (/3 n+1 , E 1 )}} (33) 

E' 

The symmetry (|32p naturally leads to a similar symmetry for the transition rates between nodes in reduced, i.e. 
energy, space, 

W E [{(3, E) -> (/?', E')] = We [(/?', E') -» (0, E)} . (34) 
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In order to finally derive an effective Master equation for hopping in temperature space, we have to additionally 
assume fast relaxation in energy space, i.e. on times scales t < T. This means that at any particular node we assume 
to have reached the respective equilibrated distribution P (/3,E). Using similar reasoning as in Appendix lAl for GE, 
this last approximation leads to the final form of the Master equation in temperature space on a coarse grained time 
scale t->t/TN, 

P(p n ,t + 1) = P(p n - 1 ,t)Wp(p n -i -+ 0) + P(J3 n+1 ,t)W f) (J3 n + 1 ->0) + 

P(P„,t) [1 - Wpifi -> /?„_!) - W p {f3 -> p n+l )] (35) 

In a way similar to the derivation of Eq. (|A3|) , effective transition probabilities can be derived from the equilibrated 
distributions at a node, 

Wptf ^P') = I dE ( dE'P (/3,E) PM {(E,P) (E',{3')} P (P',E') . (36) 



We will discuss in Appendix [B] the properties of these 
effective transition probabilities in particular situations. 
Finally, the symmetry of the transition probabilities 



W (J3 -> p') = Wp (P' - P) 



(37) 



holds analogously here, too. 

Due to this last property, the stationary distribution 
for PT can be derived easily from Eq. (|35p and is given 
simply by 



Po(P) = const , 



(38) 



i.e. in an equilibrated simulation every single replica 
appears on each control parameter node with the same 
probability, 1/(N +1) with N + 1 being the number of 
nodes. This result is an important simplification of the 
situation over the case of simulated annealing and over 
GE, and is due to the construction of replica exchange. 

Flow between the control parameter nodes can now be 
analyzed, too. Here, replicas reaching node or N are 
labelled and the respective distributions over the nodes 
can be monitored. Using the same notation as before, 
n U p{i) being the number of replicas at node i that came 
from node 0, and nd own (i) being the number of replicas 
at node i that came from node N, we can measure the 
fraction of replicas moving up 



T^up (y) 



(39) 



and a corresponding quantity for those moving down, 
fdown{i)- Both distributions are stationary distribu- 
tions of probability flow between temperature nodes, 
with boundary conditions f up (0) = 1, f up (N) = and 
fdown(0) = 0,f down (N) = 1, respectively. As before, 
they can be analyzed by looking at the stationary solu- 
tion of the underlying stochastic equation (|33| . It can be 
written as 

P(Pn,t+ 1) - P(Pn,t) = J(P n ,t) ~ J(/3„-i,t) (40) 

which is the discrete form of the continuity equation (|12p . 
The discrete case current J(/3„,t) is given by 

J{fin,t) - P(l3 n+U t)Wp(P n+ i P n ) - 



P(P n ,t)Wf3{P n -> p n+ i) 



(41) 



Consequently, the stationary current J is determined 
from 



J = Pj(p n+1 )W {p n+1 p n ) 

Pj(Pn)W {P n P n+l ) 

= const 



(42) 



with Pj(Pn) being the stationary distribution for flow un- 
der the above boundary conditions. Taking into account 
the symmetry properties of the transition probabilities, 
the stationary distribution for constant current between 
nodes /3q and /3jv is given by 



Pj(Pn 

with the current 

J = 



i 



- W (Pi -> p l+ i) 



(43) 



7V-1 

E 



i 



w p {Pi^p i+l ) 



(44) 



Due to the simple form of the stationary distribution 
among nodes (l38|) . the analytic forms for Pj(i) and J 
are considerably simpler than for GE. 

Again, as before with GE, a concept different from 
stationary flow can also be analyzed, the total mean first 
passage time to cross the network of nodes. For the gen- 
eral hopping process (f35|) , this is given by [H, E3 



r = r(0 -> N) + t(N -> 0) 

N-l . i 



N 

E 



i 



~[ P O (P t )W (pi -> Pi-!) j- 
^ 1 

W)Wp(Pi^Pi 
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This final result is practically a discrete version of 
Eq. (|18[) . Taking into account that the stationary dis- 
tribution in PT is constant, we finally obtain a result 
that is just the inverse of Eq. (|44[) . 



N-l 

E 

i=0 



1 



(46) 



Here is an important difference to GE. There the local 
diffusion coefficient was fixed by the move set and - as 
it turned out in actual simulations - mostly independent 
from the chosen weight function. The stationary distri- 
bution, however, was free to be chosen by varying the 
weight function. 

In the case of PT this is exactly the opposite. Here 
the stationary distribution is fixed by construction of the 
replica exchange process. However, by adjusting the con- 
trol parameter intervals, the transition probabilities can 
be chosen relatively freely. Hence, we are interested in 
obtaining the optimal distribution of transition probabil- 
ities that maximizes the flow across the control param- 
eter space. Since we are mainly interested in local vari- 
ations of the optimized transition probabilities, i.e. in 
deviations from an average value (that will be actually 
determined afterwards), we keep the average transition 
probability constant via a Lagrangian multiplier, i.e. 



fN-X 

E 



i 



N-X \ 

A^^(ft^M =0 (-47) 

i=0 ) 

An equivalent equation results for optimizing r. 

It is easily seen that optimizing the current as well as 
the mean first passage time simply gives a constant tran- 
sition probability between neighboring nodes the whole 
range of control parameter values, 



W opt = const 



(48) 



as optimal solution. Consequently, from (|4"5|) we can con- 
clude that the optimal flow distribution among the nodes 
is linear in the node number. 



Pj{0n) = n/N 



(49) 



Therefore, the temperature spacing is optimal if such a 
flow distribution together with constant transition prob- 
abilities can be obtained in an actual simulation. The 
linear dependence of the flow distribution on the node 
number, Eq. (|49|) . was obtained for PT already in 
by mapping PT onto the Fokker-Planck equation for GE, 
Eq ijSJ), and assuming a particular temperature depen- 
dence for the local diffusion coefficient. Here however, we 
see that it follows directly from the hopping-dcscription 
of PT. In addition, we obtain its equivalence to constant 
transition probabilities, Eq. 



X 




FIG. 1: Sketch of state space for generalized ensemble sam- 
pling (GE) in the case of broken ergodicity; X denotes any 
degree of freedom orthogonal to the control parameter (en- 
ergy). 



The iteration scheme used in Refs. 0, HI, for as- 
signing temperatures to nodes appeared to exhibit 
fast convergence to the optimal behavior of [49j We 
can rephrase it here without having to recur to some 
intermediate local diffusivity (and we stick to our use of 
inverse temperatures as example control parameters): 

(i) a particular set of control parameters pa > Pi > 
... > betciN — 1 > Pn gives rise to a flow distribution 
/u„(0) - 1 > fu P (l) > ... > fu P (N - 1) > f up (N) = 0; 

(ii) these latter values give rise to stepwise de- 
fined function <?[/], with g[f{i)] — Pi, in par- 
ticular g[l] = Pq and g[l] = Pn, and lin- 
ear interpolation in between these values; 

(iii) the new control parameter values are 
determined from this function by P[ = 
g[i/N],i = 1,...,N — I, keeping /3 and Pn fixed. 
This procedure is actually illustrated quite nicely in 
Fig. 2 of Ref. 0, and we can refrain from repeating it 
here. 



It is important to note at this point that the above 
results, in particular the equivalence of constant transi- 
tion probabilities and a flow distribution that is linear in 
the node number, depend on the validity of the under- 
lying one-dimensional representation of the simulation 
process. It turns out that, while the flow distribution 
Eq. 25] is readily attainable in actual simulations, this 
is usually not accompanied by acceptance probabilities 
that are constant over the whole system 0, H, Q . In the 
following final section we will discuss possible reasons for 
such a discrepancy. 
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node N 



FIG. 2: Sketch of bi- and multi-furcations in the case broken 
ergodicity for parallel tempering (PT); for certain nodes the 
system partitions into several disjoint free energy wells. 



IV. SUMMARY, DISCUSSION, AND OUTLOOK 

The goal of most recent advances in Markov chain 
Monte Carlo sampling is to analyze and increase the flow 
through state space. To do so, heuristic equations have 
been used to describe the flow in reduced state space 
along a slow order parameter, e.g. the energy, in gen- 
eralized ensemble sampling (GE) and among nodes per- 
forming simulations at various control parameter values 
in parallel tempering (PT) [|| 0, H[ • In this contribution 
we have derived such one-dimensional stochastic equa- 
tions for GE and PT sampling from the underlying Mas- 
ter equations. Using these stochastic equations, weight 
functions for GE and strategies for finding optimal con- 
trol parameter values for PT can be devised that optimize 
the flow through order parameter and control parameter 
space, respectively. We have also demonstrated that op- 
timization of flow is equivalent to minimizing the first 
passage time to cross the system. 

All considerations in the previous sections were based 
on the assumptions that the Fokker-Planck (jHJ) or hop- 
ping (|35[) equations are a valid Markovian representation 
of the underlying more complex dynamics. That, how- 
ever, is true only if the approximations discussed there 
apply. 

In the case of GE these approximations are that relax- 
ations in the degrees of freedom orthogonal to the energy 
are fast, together with the locality of transitions. Since 
the ultimate goal of deriving Eq 1(5)) is the optimization 
of flow through state space, a violation of the latter con- 
dition is not detrimental. Non-locality in the energy of 
the move set usually leads to faster relaxation since state 
space is connected more densely. This is for example one 
reason for the success of the Swendsen-Wang algorithm 



[2c| . Although in such a situation Eq. |S} may not be 
able to capture the full dynamics correctly, it neverthe- 
less is still able to identify local bottlenecks. Optimizing 
the flow according to the methods discussed can handle 
these local bottlenecks in the transition between neigh- 
boring energy values. 

However, slow relaxation orthogonal to the energy 
leads to a more complicated situation. Now it may not 
be possible anymore to reach all values of the additional 
degrees of freedom by moves local in the energy. Instead, 
detours via other - usually high energy - areas of the state 
space have to be performed. This leads to the comb-like 
structure of the accessible state space sketched in Fig. [1] 
It describes the situation that free energy basins at con- 
stant energy are disconnected. 

Actually, it is this feature of state space partitioning 
that lead us to the requirement of large flow between 
low and high energy areas of the state space in the first 
place. Only for a large flow the "teeth" of the comb- like 
structure in Fig. [1] can be sampled adequately. Never- 
theless, it also leads to the situation that the effective 
one-dimensional Fokker-Planck equation © is only the 
best Markovian one-dimensional approximation of an un- 
derlying effectively higher-dimensional process. 

This situation is even clearer in the case of PT. If 
the relaxation at a particular control parameter value is 
faster than the time scale of hopping in control parameter 
space, then the requirements for the analysis performed 
in the previous section are fulfilled. However, if that is 
not the case the state space at such a node partitions 
into disjoint free energy basins that do not communi- 
cate. Viewed over the whole control parameter range, 
we are dealing with a hierarchical network of free energy 
basins as sketched in Fig [5J Such a situation has been 
aptly termed broken ergodicity [2^ . |30| and was discussed 
in the field of glassy dynamics several years ago. 

In principle, the topology of the tree-like control+state 
space depends on the time scale of the control parameter 
hopping. If relaxation is possible at all nodes, then no bi- 
furcations occur and the system is just a one-dimensional 
hopping chain as it was analyzed in the previous section. 
However, this is true only in the limit of infinite time 
between replica exchange steps, T — > oo. Practically the 
topology of the branching will be the same over a wide 
range of time scales and only the position of the branch- 
ing nodes may vary. 

For a truly one-dimensional system the optimized tran- 
sition probabilities are constant and - equivalently - 
the optimal flow stationary distribution is f(n) oc n, 
Eq. (|49p . Under conditions of broken ergodicity, the situ- 
ation is more complicated. Now, several transition rates 
between neighboring nodes may have to be taken into ac- 
count, describing exchange along the different branches 
depicted in Fig. [5] Moreover, observed acceptance rates 
are weighted averages of these various transition proba- 
bilities. Therefore, the equivalence of constant observed 
acceptance ratios to a flow distribution linear in the 
node number may no longer hold. This discrepancy has 
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been observed already in Refs. 0, H, H- While satisfy- 
ing Eq. (j4*9"|) was possible by an appropriate choice of 
node temperatures, constant acceptance rates were not 
obtained concomitantly. Such a result can be used as 
a clear signature of broken ergodicity occurring in PT. 
In contrast to GE, where such a clear criterion is not 
available yet, PT has a particular advantage here. 

Naturally, the question arises how to optimize flow in 
PT under conditions of broken ergodicity. Control via the 
choice of node temperatures is somewhat limited in such 
a situation, since changes may affect different transition 
probabilities differently. Nevertheless, even in such a sit- 
uation the choice of a flow distribution f(n) oc n still as- 
sures that the flow of replicas along the main branch, i.e. 
between the lowest and the highest temperature node, 
is still optimal. In contrast, the effect of making appar- 
ent acceptance ratios constant - if possible at all - is 
unclear and depends on a knowledge of the particular 
structure of ergodicity breaking. In the final analysis, 
however, optimizing flow under such conditions means 
that, in addition to the flow between the lowest and the 
highest node, also flow among side branches has to be be 
considered. In order to assure that flow among all side 
branches is optimized, too, a more detailed flow analysis, 



i.e. determining the flow matrix between all individual 
nodes, would have to be performed. 

We believe that an additional advantage of PT is that 
such branching situations can be analyzed directly, with- 
out resorting to an actual system. In a way, it will be 
possible to simulate the simulations to investigate possi- 
ble flow behaviors. By analyzing the Master equations 
modelling the hierarchical broken ergodicity networks, 
conclusions about the behavior of actual simulations can 
be drawn [3l[ Thereby, the present results open the way 
to investigate the effects of broken ergodicity in GE and 
PT in a more systematic way. 
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APPENDIX A: ADIABATIC ELIMINATION OF FAST DEGREES OF FREEDOM 

As a first step to eliminate the fast degrees of freedom in Eq. ([3]) , we have to single out the slow degree of freedom 
by appropriate labelling. Therefore, we replace the state label s by a more detailed one that consists of the energy E, 
i.e. the slow degree of freedom, and an additional label se that designates all microstates with energy E, s= (E,se)- 
Equation ([3|) is then replaced by 

p i(E, SE ),t + 1] = J2 E p ss ')< *] w ° se ') -> s£ )i + 

E'ytE s E , 

P [{E, s E ), t] I 1 - S Ws SE ^ SE '^ \ ' ( A1 ) 

where the sums are over all possible energies E and states s e for each energy. If relaxation among the microstates 
for a particular energy is fast, each of these states will assume the same probability, and we can approximate 

P[(E,s E ),t]^^-rP(E,t) , (A2) 

with N(E) being the number of states, i.e. J2 S = N(E), introduced to ensure correct normalization, J2e t) = 1. 
This approximation is the crucial step, since it allows us to sum over all microstates se for each energy in Eq. (|A1|) . 
After carrying out such a summation, we arrive at the Master equation for the energy, Eq. ([5]), with the transition 
probabilities We(E — > E') given by 

W E (E^E>) = ^J2J2 W sl( E > SE )^( E '> SE 'y ■ (A3) 

Note the asymmetry with respect to E and E' that is due to the non-constant density of states. 

A more rigorous and systematic treatment, which would also allow the derivation of corrections, would involve 
projection operator techniques as they are used, e. g.. in Chap. 6.4 of Ref. UtI However, since the above approach 
suffices for our purposes, we refrain from embarking on such a more detailed elaboration. 



APPENDIX B: TRANSITION PROBABILITIES strongly on the - usually unknown - distributions or- 
FOR PARALLEL TEMPERING 



In GE, a general analysis of the effective transition 
probabilities W(E — > E 1 ) is not easy since they depend 
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thogonal to the energy, combined with a possibly sparse 
move set. In contrast to GE, PT allows more insight into 
the effective transition probabilities that govern the tem- 
perature hopping. Since transitions are possible between 
any energy values, the complications due to a particular 



sparse move set do not arise. The only requirement is the 
assumption of equilibration at each temperature. In this 
limit, we can calculate the effective transition probability 
by 



W(P^p') = dE dE , Po(P,E)pM{(E,p)^(E , ,(3')]Po(P' 1 E') , 



(Bl) 



with Pq(/3, E) being the equilibrated distribution at j3 and pm given by Eq. 



There have been several approaches to evaluate that 
formula. Predescu et al. [12] and Kofke [H| emphasize 
the importance of taking into account the asymmetry of 
an actual distribution, having a low energy cutoff and 
an exponential tail at high energies. Nevertheless, Kone 
and Kofke [34 ] later use an approximation based on a 
Gaussian approximation, i.e. symmetric without cutoff 
and non-exponential tail, together with the assumption 
of constant specific heat over the entire range. All au- 
thors limit themselves to unimodal distributions in their 
analysis and do not question the peak is quadratic in 
energy. 

However, the distributions change dramatically at crit- 
ical values of the control parameter, i.e. at first and 
second order phase transitions. While at second order 
phase transitions the functional form of the peak changes 
from quadratic to quartic, at first order phase transitions 
the energy distributions become bimodal. With respect 
to the distribution tails, on the other hand, since the 
goal is anyhow to optimize the transition probabilities, 
one should try to avoid control parameter intervals so 
large that the explicit structure of the tails become rel- 



evant. Moreover, Eq. (|B1[) is anyhow only valid in the 
limit of fast relaxation at a node. So it is from the outset 
only an approximation to the actually observed transition 
rate. Since, in order to optimize the flow, large values of 
the transition probabilities are sought for, a quantitative 
analysis makes sense only for the cases where the overlap 
is appreciable, i.e. for small temperature differences. 

We therefore add here our approach to evaluate (|Bip 
using the first order approximations to these distribu- 
tions, i.e. Gaussians, 



P (/?, E) oc exp 



[E-E((3)Y 
2a 2 ((3) 



(B2) 



with E((3) the average energy and <r 2 ((3) = \E— E(f3)] 
the energy fluctuations at (3. However, we avoid the unre- 
alistic and very limiting assumption of a constant specific 
heat 0. 

Assuming (3 < (3' , using a step function approximation 
of error functions that result from the inner integrals, 
and performing a symmetric evaluation we obtain 



W(f3 -► &) 



I ex P 



A(3AE + -A(3 2 (a- 



2 + erf 



f AE + A(3(a 2 + a' 2 ) 



erfc 



/ AE 



erfc 



AE 
V2<r> 



V2cr 



erf 



/ AE + Al3{a 2 + a' 2 ) ' 
V V2a> 



(B3) 



where we have used the abbreviations A(3 = (3 — [3' and 
AE = E((3) - E(f3'). We can now employ the fact that 
the specific heat is given by 

c=P(T) = -^E- {( 3) , (B4) 

as well as by 



c = f3 2 [E-E(pf =p 2 a 2 ((3) (B5) 



to obtain a relationship between the derivative of the 
average energy and the energy fluctuations 

-±E(P)=[E-E((3)] 2 . (B6) 

Using 

E'(f3)^^-E([3)=-<T 2 (f3) , (B7) 
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this relation can be employed to approximate the differ- 
ence of the average energies by 

AE = E((3)-E(f3') 



1 



E'((3) + E'((3')) A/3 



= + a' 2 ) A(3 



(B8) 



Note that we have assumed (3 < ft , i.e. A/3 < and, 
consequently, AE > in the evaluation of Eq. (|B1[) . This 
result shows that the exponent in Eq. (|B3[) cancels and, 
using erf(— x) = — crf(x), we have the final approximate 
expression for the transition probability 



W(f3^f3') 



rfc 



/ AE 



erfc 



/ AE 



erfc 
erfc 



/ |A/3|(a 2 +a' 2 ) 

/ |A/?|(q 2 +a' 2 ) 
V 2 v / 2cr' 



(B9) 



For small values of AE/a this can be further approxi- 
mated to 



w{p^p) « i 



1 / A£ A£ 



2?r VV2cr \/2cr' 



1 



/ AE 



6V2tt 
0(A£; ;5 ) 



AE 1 
V2a' 
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